Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

نویسندگان

Chun Kai Ling

Kian Hsiang Low

Patrick Jaillet

چکیده

This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive -optimal GPP ( -GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of -GPP with performance guarantee. We empirically demonstrate the effectiveness of our -GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active reward learning with a novel acquisition function

Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simu...

متن کامل

Towards Practical Theory: Bayesian Optimization and Optimal Exploration

This thesis discusses novel principles to improve the theoretical analyses of a class of methods, aiming to provide theoretically driven yet practically useful methods. The thesis focuses on a class of methods, called bound-based search, which includes several planning algorithms (e.g., the A* algorithm and the UCT algorithm), several optimization methods (e.g., Bayesian optimization and Lipsch...

متن کامل

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user’s intent is known in advance or data is available to pre-train a task success predictor off-line. In practice...

متن کامل

Verifying Controllers Against Adversarial Examples with Bayesian Optimization

Recent successes in reinforcement learning havelead to the development of complex controllers for real-world robots. As these robots are deployed in safety-criticalapplications and interact with humans, it becomes critical toensure safety in order to avoid causing harm. A first step inthis direction is to test the controllers in simulation. To beable to do this, we need ...

متن کامل

Threshold Learning for Optimal Decision Making

Decision making under uncertainty is commonly modelled as a process of competitive stochastic evidence accumulation to threshold (the drift-diffusion model). However, it is unknown how animals learn these decision thresholds. We examine threshold learning by constructing a reward function that averages over many trials to Wald’s cost function that defines decision optimality. These rewards are ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

نویسندگان

چکیده

منابع مشابه

Active reward learning with a novel acquisition function

Towards Practical Theory: Bayesian Optimization and Optimal Exploration

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Verifying Controllers Against Adversarial Examples with Bayesian Optimization

Threshold Learning for Optimal Decision Making

عنوان ژورنال:

اشتراک گذاری